Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] Better scaling on RVD.union #6943

Merged
merged 1 commit into from Aug 26, 2019

Conversation

tpoterba
Copy link
Contributor

Do a tree reduce instead of a linear reduce. This means that the java
stack depth is log2(N) instead of N, and prevents stack overflow errors
when unioning hundreds of tables together.

Do a tree reduce instead of a linear reduce. This means that the java
stack depth is log2(N) instead of N, and prevents stack overflow errors
when unioning hundreds of tables together.
@patrick-schultz
Copy link
Collaborator

I'm confused by the stack depth problem. reduce isn't recursive, it forwards to reduceLeft:

  def reduceLeft[B >: A](op: (B, A) => B): B = {
    if (isEmpty)
      throw new UnsupportedOperationException("empty.reduceLeft")

    var first = true
    var acc: B = 0.asInstanceOf[B]

    for (x <- self) {
      if (first) {
        acc = x
        first = false
      }
      else acc = op(acc, x)
    }
    acc
  }

@tpoterba
Copy link
Contributor Author

The problem is that in the ordered merge usage, the spark DAG builds up a stack of 200 RDDs / iterators.

@patrick-schultz
Copy link
Collaborator

Ah, right, that stack.

@danking danking merged commit 990e875 into hail-is:master Aug 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants